Markov chain Monte Carlo for exact inference for diffusions 



Sermaidis, G.* Papaspiliopoulos, O.I Roberts, G.O.fBeskos, A.^and Fearnhead, P.* 



£NJ ' Abstract 

»j | We develop exact Markov chain Monte Carlo methods for discretely-sampled, directly and indirectly ob- 

. served diffusions. The qualification "exact" refers to the fact that the invariant and limiting distribution of the 

Markov chains is the posterior distribution of the parameters free of any discretisation error. The class of pro- 
cesses to which our methods directly apply are those which can be simulated using the most general to date exact 
simulation algorithm. The article introduces various methods to boost the performance of the basic scheme, in- 
cluding reparametrisations and auxiliary Poisson sampling. We contrast both theoretically and empirically how 
this new approach compares to irreducible high frequency imputation, which is the state-of-the-art alternative for 
the class of processes we consider, and we uncover intriguing connections. All methods discussed in the article 
are tested on typical examples. 
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to ; 1 Introduction 

■ Diffusion processes provide a flexible framework for modelling phenomena which evolve randomly and continu- 

\ ously in time and are extensively used throughout Science, e.g. in finance (Ait-Sahalia and Kimmel, 2007), biology 

(Golightly and Wilkinson, 2006), molecular kinetics (Horenko and Schiitte, 2008), pharmacokinetics/pharmaco- 
dynamics (Picchini et al, 2010) and spatio-temporal modelling (Brown et al., 2000). 

A time-homogeneous diffusion process V e R rf is a Markov process defined as the solution to a stochastic 
differential equation (SDE): 



dV s = y8(V s ; 0i)ds + cr(V s ; 2 )dW s , V = v,s>0, (1) 



where W is a c/-dimensional standard Brownian motion. The functions f3 : W 1 x ©i — > E. d and cr : R d x ©2 — > R 
are known as the drift and diffusion coefficient respectively, and are allowed to depend on an unknown parameter 
9 — (6\,62) G © c W. The assumption of distinct parameters for each functional is by no means restrictive and 
is adopted here for ease of presentation. We assume that cr is invertible and make the usual set of assumptions on 
(3 and cr to ensure that (1) has a unique weak non-explosive solution, see for example Theorem 5.2.1 of ©ksendal 
(2003); see also Section 2. 

Even though the process is defined in continuous time, the available data consist of observations recorded at a 
set of discrete time points, 

Y:={V t0 ,V tl ,...,V tn ], 0</ </i <...</„. 

Statistical inference is pursued in a Bayesian framework where prior beliefs about the parameters, encoded via a 
prior density 7t(9), are updated on the basis of the available data through the discrete-time likelihood to yield the 
posterior beliefs, encoded via the posterior density: 

n 

7r(6 I Y) oc n{6) \\ P^OV. - °) - ( 2 ) 
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where Af,- = f, - f,_i is the time increment between consecutive observations and 

p t (v, w; 0) = P(V, e dw | V = v)/dw, t>0,v,we R d , 

is the transition density of the process. 

Bayesian and generally likelihood-based inference in this context is hindered by the unavailability of the 
transition density and sufficiently accurate approximations to the density exist only when t is sufficiently small. 
One strand of the literature approaches the inference problem by resorting to Monte Carlo data augmentation 
(DA), according to the following principle. First, a DA scheme is constructed by identifying auxiliary variables 
such that the joint density of those and the observations, known as complete likelihood, is analytically available. 
Subsequently, inference is performed by employing a Markov chain Monte Carlo (MCMC) algorithm which 
targets the posterior density of parameters and auxiliary variables. Early DA schemes were based on imputation 
of a finite number of points, say M, of the latent diffusion bridges {V s , s e (f,-i, f,)}, see for example Eraker (2001); 
Elerian etal. (2001). The complete likelihood is still intractable but can be reasonably approximated using an Euler 
scheme which now operates on smaller time increments. The bias introduced in this approximation is eliminated 
by increasing M. There are three serious challenges with this approach. First, the simulation of the latent bridges 
conditionally on the parameters; this simulation is required in the "Imputation" step of a DA algorithm. This 
problem has been intensively studied, see for example Papaspiliopoulos and Roberts (2012) for a recent account. 
Second, the choice of M, at least in practice. A good approximation usually requires a large value of M which is 
typically found by repeated runs of the algorithm until the estimated posterior distributions show no change. This 
adds a substantial computational burden. Third, Roberts and Stramer (2001) showed that when 02 is unknown the 
mixing time of the MCMC algorithm is O(M). This is due to the quadratic variation identity, according to which 
any continuous-time path contains infinite information about On . Thus, in these early DA schemes reduction in bias 
comes with unbounded increase in the Monte Carlo variance. Roberts and Stramer (2001) constructed appropriate 
path-parameter transformations in order to yield a working DA scheme which is valid even in the limit M — > oo 
unlike those of the previous generation. We refer to the limiting case M — > oo as path augmentation (PA). An 
MCMC algorithm based on PA is not implementable in practice, since it involves infinite amount of computation. 
For finite M, we refer to the DA scheme as high frequency augmentation (HFA) and to the MCMC algorithm 
which targets it as approximate MCMC (AMCMC), the term reflecting the fact that bias is introduced due to the 
discretisation of the paths. More details on these schemes are given in Section 3.1. 

A new generation of Monte Carlo schemes for diffusions was initiated with the introduction of the exact 
algorithm (EA) for the exact simulation of non-linear diffusions. The potential of using the EA to build an MCMC 
algorithm for parameter estimation was sketched in Beskos et al. (2006b) for a restrictive class of univariate 
diffusions. 

In this paper we present a novel augmentation scheme, called exact data augmentation (EDA), and develop 
MCMC algorithms for all diffusions which can be simulated under the broader framework of the so-called EA3 
(Beskos et al, 2008). This generation of MCMC algorithms based on EDA is referred to as exact MCMC (EM- 
CMC), and is such that their equilibrium distribution is the exact posterior distribution of the parameters, i.e., free 
of any discretisation error. We enhance algorithmic performance by designing noncentred reparametrisations and 
extend our methods to the case of indirect observations where interest lies in estimating both parameters and the 
latent diffusion process. A further contribution of this work is a theoretical investigation of the connection between 
EDA and PA. First, it is shown that EDA augments more information than PA. This is rather surprising since the 
former appears to augment only a small finite-dimensional distribution of the missing paths whereas the latter in 
principle augments continuous paths and in practice high-frequency approximations thereof. The key is that the 
extra augmentation in EDA creates conditional independence relationships which are exploited to apply an algo- 
rithm which targets an infinite-dimensional state using finite computation. This result also suggests that AMCMC 
for the same amount of computation is expected to mix faster than EMCMC; this is effectively another instance 
of the bias-variance tradeoff. This connection motivates a further observation which links the two approaches and 
suggests a way to improve the convergence rate of EMCMC using auxiliary Poisson sampling. 

A comment on the applicability of the methods proposed here is due. The methods rely on a variance- 
stabilising transformation after which the diffusion has constant diffusion matrix and drift which is of gradient 
form, see Section 2. 1 for details. The transformation poses little limitations for univariate processes, but might not 
even exist for general multivariate SDEs with coupling in the diffusion. On the other hand, multivariate processes 
with gradient drift structure and no coupling in the diffusion are rather standard in the framework of physical sys- 
tems. Note that the variance-stabilising transformation is necessary for the HFA approach as well. In summary, 
the technology we develop here is not directly applicable to general stochastic volatility models, say, although 
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exploiting particular structures can push considerably these limitations, see for example Kalogeropoulos et al. 
(2010). Additionally, advances in the exact simulation of diffusions, as for example in Etore and Martinez (201 1); 
Goncalves and Roberts (2012) would eo ipso lead to exact MCMC methods following the framework of this arti- 
cle. Irreducible DA schemes avoiding this transformation are also currently under investigation, see for example 
Golightly and Wilkinson (2008). 

The article is structured as follows. Section 2 contains the background on assumptions, notations and recalls 
the EA. Section 3 presents formally the EDA and contrasts it to PA. Section 4 describes noncentred reparametri- 
sations of the EDA and auxiliary Poisson sampling from improving algorithmic performance. Section 5 discusses 
extensions to indirect observations. Section 6 carries out a careful and extensive numerical comparison of several 
schemes. Section 7 closes with a discussion and the Appendix contains the proofs of main results. 



2 Preliminaries 



In this section we collect some necessary background. In terms of notation, x' 1 ' or x^ denote the ;th or (i, j)th 
element of a vector or matrix x, det[x], x T and x denote the determinant, transpose and inverse of a matrix x 
where appropriate, and Id denotes the d x d identity matrix. For two vectors x and y, we define vectors x and y 
such that jc' 1 ' = x' 1 ' A y' 1 ' and y' 1 ' = x' 1 ' V y' 1 '. The Euclidean norm is denoted by ||.||. V A and A. v denote the Jacobian 
matrix and Laplacian operators respectively, that is if x € R d , then for functions f\ : R d — > R" 1 and f 2 :R d ^ R 



V,/i(x) = 

We define D : R d x 2 -> R as D = 
coefficient. For function f : R d ■ 
(l)by 



/Ax) 



dxW 



i=l,...,d;j=\,...,m 



A. v / 2 (x) = J] 



d 2 f 2 (x) 
f 3(xW) 2 



Idettcrjr 1 and y : R d x 2 



lidxd 



as j — crcr T , where cr is the diffusion 
twice continuously differentiable on its domain, we denote the generator of 



d 2 f(v) 



Finally, TV, (u) denotes the density of a Gaussian random variable with mean vector and covariance matrix tlj 
evaluated at u 6 R d . 



2.1 Reducible diffusions of gradient type 

The methods in this paper rely on the existence of a transformation rj, known as Lamperti transformation, such 
that T]{V S ; 82) solves an SDE with constant diffusion matrix. This transformation can be obtained for univariate 
diffusions rather trivially. For multivariate processes its existence is a subtle matter. In the elliptic case a sufficient 
condition is 

[LAM]: [V v tj(v;6 2 )] T = a-\v;6 z ), 

which can be simplified to yield explicit conditions on the elements of cr -1 ; see for example Ait-Sahalia (2008). 
In the rest of the article we will assume the existence of this transformation and denote its inverse by 7]~ l . If 
X s '•= i](V s ', 62) is the transformed diffusion, then by Ito's formula X solves 

dX s = a(X s ; 6)ds + dW s , X = x = tj(v; G 2 ), s > 0, (3) 

where a : R d x -> R d , with 

a [k) (u;6) = A g T] m {r}- l (u;02);62}, k=\,...,d. 

If pt(x, z\ S), x, z G R d is the transition density of X, then the transition density of V can be expressed as 

p,(v, w; 9) = D(w; 2 )p, {r/(v; 2 ), tj(w; 2 ); 0} . (4) 

The methodology also requires certain conditions on the drift a of the transformed process. The following set 
of assumptions should hold for any e 0: 
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[SMOOTH]: a {k} (-;0) is continuously differentiable for k = l,...,d. 

[GRAD]: There exists H : R d x -> R such that V x H(x; 0) = a(x; 0). 

[LBOUND]: There exists 1(0) > -oo, such that 1(0) < inf rf \ {\\a(u; 0)\\ 2 + A x H(u; 0)]. 

The first is a very weak condition and the third is rather mild too. The second identifies X as a diffusion of 
gradient-type, where H is called the potential function. When the diffusion is ergodic, its invariant log-density 
can be expressed directly in terms of H. This condition is trivially satisfied for univariate processes but is more 
restrictive for multivariate ones. Finally, we require that a is such that the probability law generated by the 
solution of (3) is absolutely continuous with respect to the Wiener measure. A particularly useful and weak set 
of conditions are given in Rydberg (1997); in the case of (3) if a is locally bounded the conditions simply require 
that the SDE be not explosive. 



2.2 Exact simulation of diffusions 

The EA is a rejection sampling algorithm on the space of diffusion paths, which uses Brownian path proposals 
and delivers the diffusion path revealed at a finite collection of random points. The path can be filled in later with 
no further reference to the target process. The main attraction of the algorithm is that the draws are from the exact 
finite-dimensional distribution. Here we focus on exact diffusion bridge simulation, i.e., obtain samples from (1) 
conditionally on the origin Vo = v and terminal point V, = w. It turns out that this conditional simulation is really 
the key to DA methods for parameter estimation. 

The target process ( 1 ) is transformed into one of unit diffusion matrix as described in Section 2.1. The problem 
is therefore reduced to the simulation of (3) conditionally on the origin x = x(02) :— J](v; 02) and terminal point 
y - y(fh) '■- n( w ' An X-bridge yields a V-bridge by applying the inverse transformation. Let Q? denote 
the law of the X-bridge starting at x and terminating at y at time t, and Wl' the law of a Brownian bridge 
conditioned on the same endpoints. The following lemma, which is a restatement of Lemma 1 in Beskos et al. 
(2006b), derives the density of the target law with respect to the Brownian bridge law. 

Lemma 1. The law Q? is absolutely continuous with respect to Wx with density 



dQ ( W ) Nt (y-x) 



6Wf ' y) Pt(x,y,0) { 2 Jo 1 ' J 

where <p : R d x -> R + is defined by 

(/> (u; 8)=^ {\\a(u; 0)\\ 2 + A,H(u; 0)) - 1(0). 

The EA is based on recognising (6) as the probability of a specific event from an inhomogeneous Poisson 
process of intensity <p (X s ; 0) on [0, t\. Such processes can be simulated by constructing an upper bound for the 
variable intensity and using Poisson thinning. Assume that there exists a finite-dimensional random variable 
L := L(X) and a positive function r such that 

r(L; 0) > sup (p (X s ; 0) , 

S€[0,t] 

and let <D = {*P, T) be a homogeneous Poisson process of intensity r(L\ 0) on [0, t] x [0,1], with uniformly dis- 
tributed points *P — , . . . , iff K } on [0, t] and marks T = {u\, . . . , u K ] on [0, 1], where k ~ Po[r(L; 0)t]. If ./V is the 
number of points of i> below the graph s — > <p (X s ; 0) / r(L; 0), then 



Jo 



P (N = | X) = exp { - \ (p (X s ; 0) ds 

This implies a rejection sampler where a proposed path X ~ W? is accepted as a path from Q? according to 
the indicator 

K 

I(L, X, (D, v, w, 0) := [1 1 [</> [X^r, 6) /r(L; 0) < uj\ . (7) 
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The EA output is the collection \L(X), O, S(X)}, where 

5 (X) := {(0, X ), (t/fuX^),..., {iff K , (*, X,)} 

is a skeleton of the accepted path. The algorithm is presented in Algorithm 1 . The technical difficulty that underlies 
the implementation of the EA is the simulation of L{X), and primarily the conditional simulation of a Brownian 
bridge given L{X) for the evaluation of (7). This has led to the construction of three EAs that share the rejection 
sampling principle, but have a different range of applicability. 



Algorithm 1 EA for diffusion bridges 

1 : simulate L = L(X) , where X ~ wf x - y) , 

2: simulate k ~ Po [r(L; 6)t] and <D = {(i/rj, vj)}j, 1< j<K uniformly on [0,f]x[0, 1], 
3: conditionally on L sample Brownian bridge Xp,, 

4: evaluate / = I(L, X, fl>, v, w, 6) as in (7); if /= 1 then return {L,<S>,S(X)\, otherwise go to 1. 
2.2.1 The family of EAs 

Each EA exploits the specific structure of the drift to construct L(X). The EA1 (Beskos et al, 2006a) is the 
simplest EA type and its framework is restricted by 

Condition 1. (p(-;ff) is bounded above. 

This condition ensures that r(L; ff) = r(8), implying there is no need for constructing L(X). As a consequence, step 
3 of Algorithm 1 merely requires simulation of a Brownian bridge at time instances ifri, . . . , iff K . The EA2 (Beskos 
et ah, 2006a) is applicable only when d — 1 and relaxes Condition 1 to a more mild one: 

Condition 2. Either lim sup i( ^ M <p (u; 6) < oo or lim sup i( ^_ co (p (u; ff) < oo. 

For simplicity consider only the first case. The algorithm constructs the proposed path by first simulating its 
minimum, say m, and subsequently the remainder of the path conditioned on m. In this setting L(X) is defined as 
the two-dimensional random variable L(X) = {m, t}, where t is the time instance the minimum is attained. Then, 
the required upper bound is found as r(L; ff) = sup 1( {0 (u; 0);u > m). Simulating a Brownian bridge conditionally 
on its minimum is based on a path transformation of two independent Bessel bridges, see Beskos et al. (2006b) 
for more details. 

The EA3 (Beskos et al, 2008) poses no upper boundedness conditions. For the moment assume d — 1 and 
consider a path X with initial point x and terminal point y at time t . The algorithm is based on creating a partition 
on the path space using a series of lower and upper bounds. For a user-specified constant 5 > yff/3 the partition 
consists of the sets C(e, x,y), e e N*, defined by 



where x,y as defined in the beginning of Section 2. 

In the multidimensional case, sets Ck(e) :— C(e, x^,y^) are constructed for each coordinate xf\ k — l,...,d, 
and L(X) is defined as the li-dimensional discrete random variable L(X) = (L' 1 ', . . . , L^) where L' ' = q, q g N*, 
if X e n d k=l C k (e k ). Hence, {L ik} < e k , k = 1, . . . ,d) = {x w - e k 6 < xf 1 < y ik] + e k 8, < s < t, k = 1, . . ,,d). Figure 
la illustrates the construction for an arbitrary coordinate. 

The random variable L is referred to as the Brownian bridge layer and a Brownian bridge path conditioned on 
this layer as the layered Brownian bridge. Conditioned on L, and using the continuity of $>(■; ff), the Poisson rate 
can be found as 



The exact mathematical and implementation details of sampling a layered Brownian bridge can be found in the 
original paper. 




C(e, x,y) 



A(e, x,y) U B(e, x,y), 



r(L; ff) = sup [<f> (u; 0) ; u w e (Jf w - L [k] 5,y {k] + L [k] S), k = 1, . . . ,d] . 



5 



2.2.2 Computational considerations 

The computational performance of EA depends on its acceptance probability. For any two fixed points x and y, 
expression (6) implies that the probability of accepting a proposed path is 

a(x,y, t,0) := E^w) 

thus suggesting that the acceptance probability decays exponentially to as d or t increase. 

Finally, we note that although EA3 has the widest framework of applicability, EA1 and EA2 should still be 
preferred whenever possible; simulating a layered Brownian bridge is a non-trivial task and is achieved by means 
of rejection sampling, thus adding to EA3 an extra level of computational complexity. In particular, an extensive 
empirical study by Peluchetti and Roberts (2008) suggests as a rule of thumb that EA3 is approximately 10 times 
slower than EA1. 

3 Data augmentation for discretely observed diffusions 

In this section we describe irreducible DA approaches for parameter estimation and present formally our EDA 
scheme. The auxiliary variables involved in EDA are intimately related to the EA output for diffusion bridges 
and lead to MCMC algorithms which involve no approximation to the statistical model of interest. At first, EDA 
appears totally different from the PA paradigm of Roberts and Stramer (2001). However, there are close but 
subtle links and the presentation in this section has been structured to naturally bring those out. Effectively, the 
PA scheme, which is recalled below, leads to two possibilities. One is its approximation by an HFA with some 
finite M, which leads to bias. The other is to consider an EA for the simulation of the auxiliary process identified 
by Roberts and Stramer (2001) and identify finite-dimensional auxiliary variables which then lead to the EDA 
scheme. Section 3.2 identifies those variables and Section 3.3 gives a theorem which establishes the connection 
between PA and EDA. 

3.1 Irreducible path imputation and finite-dimensional approximations 

We first outline the PA approach of Roberts and Stramer (2001). This scheme corresponds to the limiting case 
M — oo, and leads to an idealised, yet impossible to implement, MCMC algorithm which involves imputing 
continuous path trajectories. The auxiliary processes are obtained after two path-parameter transformations of 
the original latent bridges. We then review the HFA scheme which is constructed by approximating the PA 
scheme with some finite M. The HFA scheme requires only condition [LAM], whereas EDA additionally requires 
[SMOOTH], [GRAD] and [LBOUND] for being able to employ EA. However, HFA can be considerably improved 
when [SMOOTH] and [GRAD] are also satisfied. 

Consider for the moment only two observations from the diffusion process, Vq - v and V, = w. The process 
is first transformed as V s — > X s = r](V/, 62) as in Section 2.1. X-paths over bounded time increments only contain 
finite information for 9, since all parameters now relate with the drift, see (3). The transformed path starts at x{02) 
and terminates at y(02), which are both deterministic functions of #2 and the observations. This suggests that a DA 
scheme based on X will not work when 62 is unknown, since a realisation of X determines 62 through its endpoints. 
An alternative way to see the problem is to note that the collection of dominating measures {WT , 9 e ®} are 
mutually singular, and therefore a Gibbs algorithm based on this augmentation would be trapped in the support of 
one of these measures. This necessitates a further reparametrisation from X s — > X s , where 

X s := X s - (l - ^x(0 2 ) - - t y(6 2 ), s e [0, f], (9) 

which forces the path to start and finish at and essentially transforms the distribution ' x ' y) so that the dominating 
measure, now given by W ( ''°' 0) , is independent of 8. 

The PA is based on imputing X. Accounting for all observations, let x,(#2) := t](V ti ; 82) and denote by X, = 
{Xis, s € [0, Af,]}, i = l,...,n the imputed paths. We introduce 

G, (X- 0) = exp |JT a T (X s ; 0)dX s - I jT \\ a (X s ; 0)|| 2 dsJ , fx iiS (6) = |l - ^-J x w (fc) + ^(#2), 



exp<- 



>(X s ;0)ds 



(8) 
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and note that the inverse transformation of (9), Xj — > Xj is given by Xj tS + f/,j tS (9). Then the joint posterior density 
of 9 and imputed paths, 7Tpa(9, {Xj, 1 < i < n} | Y), is proportional to 

n 

n{9) Y] D(V ti ; e 2 )N Ati {x t (G 2 ) - Xj^{9 2 )) G A , 9); 9) , (10) 

;=i 

with respect to Leb p ®;' =1 w (A,i - a0) , where gi(Xr,0) := {l, Vs + ^t, s {ff), s e [0, Af,]}. For a detailed derivation, the 
reader is referred to Section 3 of Roberts and Stramer (2001) . 

The joint posterior density is not be computable since the augmented paths cannot be represented by a finite 
number of variables, hence the integrals cannot be computed. Instead, the paths are approximated by vectors of 
size M + 2, {X,-, ; Ar,/(M+i)> J ' - 0, . . . , M + 1), and the integrals are approximated numerically, typically by Riemann 
sums, to yield n HFAM {9, {X h 1 < i < n] | Y), where by an abuse of notation we let X, denote the path and its 
discretisation. This introduces a bias in the inference for 9. The approximated posterior is targeted by an MCMC 
algorithm which updates in turns 9 and Xj, i = l,...,n according to their conditional densities. Crucially for 
the efficiency of the algorithm, the auxiliary processes Xj are independent over i conditionally on 9 and Y, and 
thus can be updated sequentially. Each update is typically performed by proposing Brownian bridge skeletons 
and accepting them according to (10). Algorithm 2 is a typical AMCMC implementation, where updates of 9 are 
obtained from a Metropolis-Hastings step with proposal kernel q. 

Algorithm 2 AMCMC 

1: Choose^ , set 6, ■ = Af,-/(M + 1) and X° = {x? } ., < ; < M + 1 , \<i<n. Setr = 0. 
2 : For 1 < i < n , < < M + 1 

3: simulate X*={x* js ). from W <A '" 00) , and sample E/ ~ Un(0, 1), 

4: if 

n HFAM {X1 | e>, Y) 

< mFAM^i I e<, Y) 

then set X' +l = X' , else set X' +1 = X< . 
5: Sample 8'~q(ff,-) and E/ ~ Un(0, 1), 
6: if 

v g nHFA,M(8'AX' ! +[ , 1 < i < n) | 7) gCe'.eQ 
< ^HfA, M (e, 1 < i < n\ | ¥) q(9>,0>) 

then set 6> f+1 =0*, else set 6» f+1 =0'. 
7 : Set t = t + 1 and go to 2 . 



An alternative approximation to 7Tpa exists if conditions [SMOOTH], [GRAD] and [LBOUND] are satisfied. In 
particular, by using integration by parts we can transform the stochastic integrals in G^ tj into time integrals, and 
rewrite (10) as 

n(0) exp [H{x n {0 2 ); 0} - H{x (9 2 ); 0} - l(9)(t„ - t )] 

x PJ D(V ti ; 9 2 )M At , {xj{9 2 ) - Xj^(9 2 )} exp |- jT ' (Z Us + ^ s (6)l &) d^J . (11) 

The finite-dimensional approximation to (11), denoted by tchfam-> w iU typically be less biased than hhfa.Mi as 
illustrated in Section 6. The relative AMCMC algorithm follows along the lines of Algorithm 2 by replacing 
Khfa.m with 7Thfa,m- 

3.2 Exact Data Augmentation and MCMC 

The main contribution of this section is to demonstrate that an exact rejection sampling algorithm for simulating 
diffusion bridges implies appropriate auxiliary variables which can be used to design EMCMC algorithms for 
exact inference for diffusions. We describe the data augmentation and the EMCMC algorithm which corresponds 
to the EA3 case, and later comment on the simplifications that arise when a more basic EA is applicable to the 
model of interest. 
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(a) (b) 




Figure 1: (a) The jfcth coordinate of a Brownian bridge path X starting and terminating at 0. In this example the event C*(2) has occurred, (b) 
The transformed process If 1 + (1 - s/t) x m Wi) + sy iki (6 2 )/t starts at x ik] (6 2 ) and terminates at y {k] (6 2 ), with x m (9 2 ) < y {k] (8 2 ). The dashed 
lines provide a lower and upper bound for the coordinate of the transformed path. 



The augmentation is first identified for a pair of consecutive observations, Vo = v and V, — w, and is then 
extended to an arbitrary number of observations by using the Markov property. This is achieved using the two 
main tools developed so far. First, the two path-parameter transformations for irreducible DA; recall that these are 
V — > X and X — > X, which can be written in one step as 



X s = Jj(V s ; 2 ) - (l - y ) ri(v, 2 ) - -^(w; 2 ), s e [0, t] . 



(12) 



Second, the EA for simulating an X-bridge according to 



2.2. Let Qi denote the measure induced by the linearly transformed bridge X, when X is drawn from 
passing, recall that when X is drawn from W? , X is distributed according to W ( ''°' 0) . 

We first sketch an EA for simulating from Qg* using proposals from W ( ''°' 0) . This is a minor modification of the 
EA forX, since a proposed path X ~ W^' ' - 1 is accepted as a path from Q { g if and only if X s + (1 - s/i)x+ sy/t, s e 
[0, t], is accepted as a path from Q? . Let L denote the layer of a Brownian bridge that starts and terminates 
at 0, and denote by M w the joint law of (L,X); hence if (L,X) ~ M (f) , marginally X ~ W ( '- ' 0) . The pair L,X 
imply realisations for the corresponding variables in the X-space. These are easy to obtain, since conditionally on 
a given L, [Xf\ s 6 [0, t]} moves within (-D k) 6,L {k) 6), and thus 



with x = x{02) and y = y{02) as in Section 



In 



(0 2 ) - L w 6 < Xf> + 



(0i) + -y m (02) < y m (0 2 ) + L m 6, 



where we recall that x w (0 2 ) = x {k] (0 2 ) A y {k] (0 2 ) and y {k] (02) = x {k] (0 2 ) V y w (0 2 ). The above construction and the 
derivation of the bounds are illustrated in Figure 1 . 



An EA which samples from Q *' follows easily. If <D> = {*P, T} is a homogeneous Poisson process of intensity 



r(L; 0) on [0, t] x [0, 1], where *P is the projection of the points on [0, t] and T the projection on [0, 1], and 

r(L; 0) = sup (u; 0) ; u {k] e (x {k] (0 2 ) - L m S,y {k) (6 2 ) + L m 6), k=\,...,d), 
then the algorithm accepts the proposed (L, X, O) according to 

1 



I(L,X,Q>,v,w,0) := Y\ 



./=! 



r(L;0) 



X„ ; +\l 



^Ax(0 2 )+ i ^y(0 2 y,0\ 



(13) 



(14) 



which is the familiar indicator function (7), reformulated in terms of (L, X, <J>). 

The EDA scheme for a pair of observations Vo, V, is now defined by the random variables (L, X, *¥). Note that 
we can avoid augmenting T and still obtain a tractable density. The density of the auxiliary variables is derived in 
the following lemma proved in the Appendix. 



Lemma 2. Let (L,X) ~ M ( '' and *F be a homogeneous Poisson process of intensity r(L\ 0) on [0, f\. If I is the 
acceptance indicator in ( 14), then the conditional density of (L, X, *¥) given 1=1, n(L, X, *P | u, v, 0), is 



r(L; 0) K 
a(x,y, t,0) 



expj/[l -r(L- 69)]}]" 



7=1 



1 



l-y)x(69 2 ) + ^y(0 2 );0 



(15) 
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with respect to the product measure M ( ' X W>, where W' is the measure of a homogeneous Poisson process on 
[0, t] with unit intensity, and a(x, y, t, 8) is the acceptance probability of the EA. 

Extending the augmentation scheme to account for all observations is straightforward. Specifically, recall that 
Xi(8 2 ) = tl{V ti ; 8 2 ) and let Z,, Xj = {Xi S , s e [0, Af,]} denote the accepted elements of EA applied to the interval 
[/,-_!, tj], for 1 < i < n. The Markov property of the diffusion process implies that the bridges conditionally on the 
observations are independent, and thus n({Lu Xj, 1 < i < n) \ Y, 8) = Yi" =l 7r(Z ; , X t , | V tj _, , V ti , 8) . 

We complete the development of EDA with the following theorem which specifies the joint density of data, 
auxiliary variables and parameters. This has a simple computable form and it admits the target posterior n(8 \ Y) 
as a marginal with respect to the auxiliary variables and conditional with respect to the data. The key observation 
is that the joint density is only a function of the finite-dimensional {S(Xf), Lj, 1 < i < «}, which are delivered by 
the EA. Additionally, the intractable normalising constants have been cancelled out. The proof of the theorem is 
given in the Appendix. 

Theorem 1. The joint density of data Y, the p-dimensional parameters 9 and auxiliary variables {Z,-, Xj, ^ 1 < 
i < n), is given below with respect to the 8-independent dominating measure Leb" + " ®;.' =1 (M (A '-> x P (A '>>): 

11 n 

n(Y, 0, {S (Xd, L it l<i<n}) = tt(8) \\ Pa, > V„\ 0) \\ n(L u %, % | V t ._, , V,, 0) = 

!=1 !=1 
( " \ 

7i(0) exp H{x n (0 2 )- 0} - H{x (8 2 )- 8} - [1(8) - 1] (t„ - t ) - J] r CU e)At t 

1=1 / 

x fj |d(V,; e 2 )N& t , {Xi(8 2 ) - x^(8 2 )} r(L r , 8f- f] [l - (l^,. + ^,.(6); &} /r(L r , 8)] J , (16) 

and it admits (2) as a marginal when {Li, Xi, 1 < i < n} is integrated out and Y conditioned upon. 

This density can be targeted by MCMC methods; actually at this stage we are only interested in the conditional 
density given Y, i.e., the joint posterior of parameters and auxiliary variables. We advocate a Gibbs sampler 
variant since conditionally on Y and 8 the auxiliary variables {S(Xi),Lj, 1 < i < n] are independent over i and 
can be generated using the EA. The conditional density of 8 is computable and if it cannot be directly sampled, a 
Metropolis-Hastings step can by employed. Depending on the EA type used to construct the augmentation scheme, 
we distinguish between EMCMC1, 2 and 3. A typical implementation of EMCMC3 is given in Algorithm 3. 



Algorithm 3 EMCMC3 



Choose ff' and set t = . 

For I <i <n, set /, = and repeat the following until /, = 1 , 
sample layer L, of Brownian bridge path X, ~ w <A ' i,0 '°' , 

sample ~ Po[r(L,; 6»')A?,] and = {(^y.By)}^, 1< ;'<«;, uniformly on [0, Af,] X [0, 1] , 
conditionally on L, , sample Brownian bridge Xj^., 

set /; = /(L,,l„(D„y, ; _ 1 ,V, i ,e') as in (14); if /, = 1 , then set Lj +1 = L,-, = Yj, = Xj. 
Sample 8* ~ q(ff ', ■) and U ~ Un[0, 1] , 
if 

n(Y,8',{S(X\ +l ),L'+\Y < i<n}) q(8',ff) 
jT(Y,8',{S(X' i +l ),L' j +l , 1 < i < n\) q(8< ,ff*) 

then set 8 Hl =8', else set 8 H[ = . 
Set / = t + 1 and go to 2 . 



Special cases: EMCMC1 and EMCMC2 

Certain simplifications are feasible when a simpler EA can be applied to the process of interest. The EDA based 
on EA1 requires less imputation than that of EA3, since exact simulation from no longer requires the variable 
L. Thus, the augmentation scheme involves only {X,-, 1 < i < n}. The joint density analogous to (16) can be 
easily obtained and amounts to simply replacing r(L; 8) by r(8). The conditional of 8 trivially follows; originally 
it was given in Theorem 3 of Beskos et al. (2006b). 
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A DA scheme can be built using the auxiliary variables used in EA2. We do not present the details of this, 
since it is not a direct modification of the general scheme, as it is the case with EA1, but instead involves a different 
construction of bridges. Details can be found in Chapter 7 of Sermaidis (2010). 

3.3 Interpreting EDA in terms of PA 

The following result provides the connection between EDA and PA. It effectively shows that the PA scheme is a 
collapsed version of EDA, i.e., when we integrate out a subset of the latent variables we obtain the distribution 
which is targeted by PA. 

Theorem 2. Let n(6, {X,, 1 < i < n) \ Y) be the density obtained from (16) by conditioning on Y, and marginalising 
with respect to {Li, 1 < i < n), and kpa(9, {Xi, 1 < / < «} | Y) the density targeted by the irreducible path 
imputation algorithm defined in (11). Then, the two densities are equal a.s. 

The result is insightful towards the comparison of the computational efficiency of EMCMC to that of AMCMC, 
as it suggests that the mixing time of the former might generally be larger due to its higher degree of augmentation; 
a price one has to pay in order to eliminate the discretisation error. Nonetheless, Section 4 shows a variety of ways 
with which one can increase the performance of EMCMC and achieve good mixing rates. A numerical comparison 
of EMCMC and AMCMC is investigated in Section 6 in concrete examples. 

3.4 Qualitative characteristics of EMCMC and AMCMC 

We can make some qualitative remarks about the efficiency of the MCMC schemes based on HFA and EDA. 
These remarks are based on general properties of DA methods, as for example discussed in Papaspiliopoulos et al. 
(2007); Yu and Meng (201 1) but also the particular structure of the models at hand. These qualitative statements 
are backed up by numerical evidence in Section 6, and they motivate the three approaches we propose in the next 
section to boost the algorithmic efficiency. 

To fix terminology, we will identify DA with a Gibbs-type sampler which updates auxiliary variables and 
parameters according to their conditional distributions. In general, DA works well when the fraction of missing 
information is not too large relative to the observed one, i.e., when the augmented dataset is not considerably more 
informative than the observed regarding the parameters. 

In that respect, both EMCMC and AMCMC become more efficient when the time increment t between a pair 
of observations decreases. In the context of AMCMC, this is due to the fact that as t decreases, the latent bridges 
increasingly look like Brownian bridges, hence they do not carry information about the drift over and above the 
one contained in the observed endpoints and the augmented information converges to the observed one. The 
efficiency of EMCMC improves as the Poisson rate, r(L; 6)t, decreases. In the limiting case when the rate is 0, 
the missing data and parameters are independent, since the skeleton is empty, and EMCMC achieves maximal 
efficiency. 

On the other hand, as t increases, the augmented paths contain increasing amount of information about the 
parameters relative to the information content in the observed data, therefore any algorithm which iteratively 
simulates from the conditional distributions of parameters and missing data will degrade in this sparse-data limit, 
even if the conditional distributions can be simulated exactly and efficiently. EDA has a further weakness over 
HFA because the number of points in a EA skeleton is informative about r{L; 9), and thus about 6. However, we 
can deal with this dependence, which increases with t, by means of a reparametrisation that is described in Section 
4. 

Additionally, both EMCMC and AMCMC suffer at the step of updating missing data given parameters as t 
increases. In EMCMC the acceptance rate of EA decays to exponentially with t, implying that the algorithm can 
spend a large amount of time by simulating proposed paths until acceptance. Similarly in AMCMC the acceptance 
rate of the independent Metropolis-Hastings step decays to at the same rate (expression (11)), suggesting that 
the algorithm will be rarely updating the last accepted path, thus leading to a slower exploration of the state space. 
However, this problem can be improved by resorting to alternative update schemes for the imputation step. One 
option for AMCMC is to use local algorithms, see for example Cotter et al. (2012). An other alternative, which 
can apply to both algorithms, is to update the paths in smaller time segments by imputing 0(t) additional points 
between pairs of observations. This can turn the complexity from exponential to linear in t. 
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4 Boosting EMCMC efficiency 



4.1 Noncentred reparametrisation 

One general approach for improving efficiency of data augmentation in hierarchical models and auxiliary variable 
models is to adopt a reparametrisation. Indeed, we have already done so in HFA and in EDA by transforming 
V — » X as in (12). Following Papaspiliopoulos et al. (2007), for a generic random variable E and data Y, a 
reparametrisation of an augmentation scheme E is defined by any random pair (E, 9) together with a function h 
such that E = h(E, 9, Y), where h need not be 1-1. A reparametrisation is called noncentred when the distribution 
of E is independent of 9. Intuitively, in cases where Y is not strongly informative about E, a noncentred scheme 
can perform well due to the prior independence of E and 9. We will attempt to reduce the dependence between 
the Poisson process and 9 by resorting to a noncentred reparametrisation. 

Noncentred reparametrisations for decoupling the dependence between Poisson processes and their intensity 
were originally proposed in Roberts et al. (2004). Applying their idea in this context, for two observations Vo = 
v, V t = w, if *P is a Poisson process of rate r(L; 9) on [0, t] and T* is a Poisson process of unit intensity on 
[0, t] x [0, oo) with point-coordinates {(jjfj,j;j)} then 

¥ = h(V,L,e) = {t, r ,ij<r(L-,d)}. (17) 

Although T includes an infinite number of points, only depends on those for which the second coordinate is 
below r(L; 9). Accounting for all the observations, the noncentred reparametrisation is {9, {L,, Xj, 1 < ; < n}) — > 
(9, {Lj, Xi, Y,-, 1 < / < «}), where 'F; = h(^i, Li, 9). The theorem below derives the conditional density of 9 given 
the latent variables and observations, and is proved in the Appendix. 

Theorem 3. The conditional density of 9 given the auxiliary variables {Li, Xi, 1 < i < n), n nc {9 \ {Lj,Xj, 9*;, 1 < 
i < n), Y) is proportional to 

n 

n(9) exp [H{x n {6); 9} - H{x (9); 9} - l(9)(t n - t Q )] x ]~[ D(V ti ; 9 2 )N Al , {x,-(fc) - Xj^(9 2 )} 

i=l 

n oo 

x \ [ [l - I [lj < r{U, 9)] </> {%. . + //,;, (0); 9} lr(Li, 9)] . (18) 

''=1 7=1 

Notice that evaluation of (18) for any value of 9 requires only finite computation and therefore discretisations are 
avoided. 

The MCMC algorithm based on this reparametrisation is practically a small modification of that based on the 
original scheme (Algorithm 3). First, we wish to draw from the distribution of {L,, Xj, "T,, 1 < i < n) given Y and the 
current parameter value, say 9' . It is clear from (17) that 'P; need only be revealed on [0, Af,] x [0, r(L,; 9 1 )], which 
involves a finite number of points ~ Po[r(L,-; 9')Ati] and is sufficient for the implementation of the EA. The zth 
output consists of 4*, partially observed at {(tfrtj, fy), 1 < j < k,} and the pair {L,, Xj} discretely observed at times 
if/ij. However, sampling from the distribution of 9 given the latent variables and Y is more tricky. Specifically, 
if the proposed value, say 9* , is such that r(L,; 9*) > r(L,; 9'), then evaluating (18) at 9* requires revealing Z,- at 
additional time points [ipij; r{Li\ 9') < £jj < r(L,-; 9*)}, which have not been revealed in the EA output. Notice that 
this does not occur when r(Z,; 9*) < r(Li, 0). 

We propose two ways to overcome this. The first is based on prospectively revealing 1 T, at any additional 
time instances by simply simulating extra uniform random variates on [0,A?,] x [r(L;; ff), r(L;; 9*)], where 
k* ~ Po {[r(L,-; 9*) - r(Li, 0')] Af,j. The path X t is then filled in at the additional points by Brownian bridge inter- 
polations. The second is closest in spirit to the retrospective nature of the EA. In particular, 9* can be simulated 
prior to the application of the EA and therefore r(L,; #') and r(L,; 9*) are known before the simulation of the la- 
tent path. Consequently, we can simulate T*,- directly on [0, Af,] x [0, r(L,; 9*)] and reveal X, at all required time 
instances during the implementation of the EA. 

In this paper, we adopt the retrospective approach because it can be applied in a similar fashion to all three 
EAs. The prospective approach is simple in the EA1 case due to simple Brownian bridge interpolations, but 
becomes more involved in the EA2 and EA3 cases. 

Again, a simplification can be achieved when EA1 is applicable, where the transformation in that case becomes 
{!;,¥;, 1 < i < n) -> {Xi,^i, l<i< n), where T, = h(W h ff) = $ u ; |y < r(9)}. The full conditional density of 6 
is essentially given by expression (18) replacing r(L,; 9) with r{9). Noncentred reparametrisations for EMCMC2 
also exist; details can be found in Section 7.4.3 of Sermaidis (2010). 
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Algorithm 4 Noncentred EMCMC3 



1 : Choose ff' and set t = . 

2: Sample 6*~q(6',-) and {/ ~ Un[0, 1]. 

3: For 1 <i<n, set /, = and repeat the following until /, = 1 , 

4: sample layer L, of Brownian bridge path 1, ~ W (A, " 00) , and set r max = r(L,-;#')V r(L i ;9'), 

5: sample ic, ~ Po[r max Ati] and Sj,j,|,j)}j> 1< ./'<£, uniformly on [0, Af;] x [0, 1] x [0, r max ] , 

set % = % = m,jJij)})' 

6: set <D, = {¥,,T,}, where ¥, = /i(¥,,L,,0') and T,- = h(Yj, L,, 0') as in (17), 

7: conditionally on L, , sample Brownian bridge X0 r , 

8: set /, =/(L„X„D„y,,._ 1 ,V, i ,0') as in (14); if /, = 1, then set L'+ l =U, = and X'+ l = X, ,. 

9: If 

n„A6* I {L l ^ 1 ,X< +1 ,H" i ¥l , 1 < i < n], Y) q(9* ,0') 
7T„ C (0< | {L| +1 ,X' +1 , l P[ +1 , 1 < i < «}, 7) q(ff,8*) 

then set 0' +1 = 0*, else set = . 
IS : Set ? = t + 1 and go to 2 . 



An interweaving strategy 

When a noncentred transformation is available, it is not necessary to choose between that and the original parametri- 
sation. Yu and Meng (2011) propose instead to interweave the two, by creating a single algorithm which mixes 
steps of both algorithms. This requires practically no extra coding work, but as it is demonstrated in the arti- 
cle, in certain cases the interweaved algorithm outperforms its parent algorithms even when taking the added 
computational cost into account. 

In the EMCMC context, if ff and {IS, X'., *FL 1 < i < n] denote the current state of the chain, then the algorithm 
can effectively be described in four steps. The first two are identical to sampling from the noncentred algorithm, 
i.e., the latent variables are updated to {Lj +1 ,X ( ' +1 , *1'J +1 , 1 < i < n] using the EA and then the parameter is updated 
by drawing ff + ^- conditionally on these latent data and observations. Subsequently, the latent data are transformed 
to their original parametrisation using = h(Wf~ , notice that this step does not impose any computational 
difficulties, since it merely involves a deterministic transformation. Finally, the parameter is re-drawn under the 
original parametrisation, ff +l ~ n(- \ {5(Z ; f+1 ),L[ +1 , 1 < i< n},Y). 

4.2 Auxiliary Poisson sampling 

An alternative way to improve the mixing time by increasing the computational cost is to exploit the connec- 
tion between EDA and PA. For simplicity we present the result for EMCMC 1, and then discuss extensions to 
EMCMC3. 

Note that if r(6) is the Poisson rate, then it is valid to apply EA1 with any Poisson sampling rate R(8) > r(6); 
the acceptance probability is invariant to that choice. Increasing the value of R(ff) leads to an increase in the 
computational cost since the number of points at which the path is evaluated gets larger. Thus, in terms of 
computing time it is optimal to implement the algorithm with the smallest possible R{6). This is not the case 
though for EMCMC 1 that iterates between imputation and estimation. As it turns out, the dependence between 
missing data and parameters decreases with R(0) and optimal implementation in terms of execution time and 
Monte Carlo error can be achieved for R(8) > r{9). Thus, we consider data augmentation where the auxiliary 
variables are chosen according to the output of EA1, as described in Section 3.2, but where the Poisson rate is 
R(6). 

A theoretical result (not included here) goes along the following lines. Let R(ff) = r(6) + A, where A > is a 
user-specified constant independent of 9. Then, the joint law of (9, \X h T*, , 1 < i < n}) after a step of the EMCMC1 
algorithm with parameter A, converges to the law of one step of PA when A — > oo. An argument for proving this is 
based on properties of series expansions for exponential functionals, as discussed for example in Papaspiliopoulos 
(2011). 

Hence, in a sense the auxiliary variables *P, are effectively integrated out by increasing computation and an 
improvement in the convergence of EMCMC is expected as A increases. Similar arguments are valid for the 
noncentred algorithm since it merely involves a reparametrisation of A similar property is enjoyed by a 
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generic EMCMC3. In fact, the limiting algorithm as A increases is a PA which also imputes the layer, although 
the latter is immaterial in the limit since it does not contain additional information about the parameters. 

5 Diffusion observed with error 

The methodology described so far can be easily extended to account for cases where the diffusion is not directly 
observed. We assume that the observations inform only indirectly about the value of the process (1) at discrete 
times ti, i = 0, 1, . . ., n, according to the following observation equation 

Z ti ~q(-\ V ti ,r), 

where Z : = \Z, B , Z ti , . . . , Z tn ] are conditionally independent given Y = {V t0 ,V tl , . . . ,V t J, and q is a known density 
function parametrised by an unknown parameter t. 

The EDA described in Section 3.2 is not appropriate anymore since the end points V t , are not directly observed. 
On the other hand, Theorem 1 can be used to design an augmentation scheme where apart from the auxiliary 
variables involved in EDA, the latent points Y are imputed as well. A direct application of Bayes' theorem yields 
that the joint posterior tt(Y, 9, r, \S{Xj), L„ 1 < i < n} | Z) is proportional to 

n 

n(0, t) tt(Y, {S (Xi), U, 1 < i < n} | 9) n{V h \ 9) \\ q(Y t , | V, t , r), (19) 

(=0 

where the second term is obtained directly from Theorem \, n{- \ 9) is a prior density for the initial point of the 
diffusion process, and n{9, t) is a prior density of the parameters. As before, a direct simplification is available 
when the EA1 can be applied to simulate from Xj. 

A simple scheme for simulating from (19) is by a component-wise updating algorithm. \S(Xj),Lj, 1 < i < 
n] and 9 are simulated conditionally on Y and t, using any of the EMCMC schemes we have proposed, and 
subsequently Y and t conditionally on {S(Xj), L,, 1 < i < «} and 9 according to the conditional derived from (19). 
When n{9, f) - tx{9)ji{t), t is conditionally independent from {S (Xj), L t , 1 < i < n] and 9 given Y, and may have 
a conditional density which can be easily simulated. Simulation of the latent points Y can be done with various 
ways. The simplest is to update them one-at-a-time according to their conditional density, an approach often called 
single-site updating. Such schemes for time series are known to be in general problematic, especially when the 
latent process exhibits high persistence and the observations are not very informative about the latent points, see 
for instance Pitt and Shephard (1999); Papaspiliopoulos et al. (2007). In the example we consider in the next 
section we adopt this simplistic approach since it works quite well. In applications where the process Y exhibits 
very high persistence a joint update of the endpoints can be done using a Metropolis-adjusted Langevin algorithm, 
or more general version of such algorithms, as discussed for example in Girolami and Calderhead (201 1). Other 
possibility is to resort to an overlapping block scheme, as for example in Pitt and Shephard (1999); Golightly and 
Wilkinson (2008). 

6 Numerical investigation of MCM C algorithms 

We investigate the numerical performance of the several algorithms we have presented on some standard examples. 
One is a diffusion which belongs in the Pearson family, see for example Forman and S0rensen (2008), and it is 
an example of a process that can be simulated using the EA1. The second is a univariate double well potential 
model, typical of models that are used to describe processes with metastable behaviour, see for example Metzner 
et al. (2006). This is an example of a process that can only be simulated using the EA3. The third is a double well 
potential model in two dimensions. 

We compare several algorithms. The plain-vanilla EMCMC together which its elaborations: using noncentred 
reparametrisation, the interweaved strategy and auxiliary Poisson sampling. Additionally, we compare against 
both versions of AMCMC described in Section 3.1. The first is the plain one as introduced in Roberts and Stramer 
(200 1 ) and is based on ( 1 0); we refer to this basic version simply as AMCMC. The second eliminates the stochastic 
integrals by integration by parts as in (11) and will be denoted in the text by "AMCMC (int-by-parts)". This is 
expected to enhance algorithmic performance, hence we evaluate the effect of this approach. For both versions, 
we add the suffix -M to indicate the number of imputed points. In general, the quality of each MCMC output 
is assessed with an adjusted effective sample size, defined as the ratio of the effective sample size (ESS) to the 
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computational (CPU) time to ran each algorithm. The adjusted ESS is essentially the number of independent 
draws per second generated by the Markov chain. The ESS is calculated with the R (R Development Core Team, 
2010) package coda (Plummer et at, 2010). 

The existence of EMCMC allows us to have a realistic evaluation of the performance of the biased approaches. 
We carry out bootstrap Kolmogorov-Smirnov tests (Sekhon, 2007) for checking whether the marginal posterior 
distributions on the parameters obtained by different levels of imputation are significantly different from the exact 
samples. These evaluations are based on thinning the original Markov chain output so that to obtain practically 
independent draws from the corresponding distributions. 

6.1 A Pearson diffusion 

Consider the univariate diffusion process specified by 

dV s = -p(V s - n)ds + o~ ^1 + VjdW s , 

where cr > 0, p > is a mean reverting parameter and p e R is the stationary mean. The parameter vectors are 
identified as 6\ = (p, p) T and 62 = cr. This model belongs to a rich class of diffusion processes, known as the 
Pearson diffusions, and admits a stationary distribution with skewness and heavy tails that decay at the same rate 
as those of a f-distribution. 

The unit volatility process is obtained as X s := sinh( V s )/cr, with drift given by 

a(x; 6) = - (£ + ^)tanh((rx) + ^— . 

\cr 21 <x cosh(crx) 

This is a process which belongs in the EA1 class, and exact inference can be performed with 

W = -\ [p + y + y ) , *(Q = \ {p( 6 M + 8) + 3<r 2 + + /i + 1) j . (20) 

We test the methods on a simulated data set from this process, based on n — 1000 (excluding the initial point) 
equidistant points with Af; = 1 , Vo = 1 and parameter values (p,p,cr) = (1/2,1,1/2) (Figure 2a). We have used 
improper prior densities for the parameters, n(p) cc 1, n(p) cc 1, and n{cr) cc l/cr. For all algorithms, sampling 
from the conditional density of the parameters was performed using a block Metropolis-Hastings step. The chains 
were run for 10 5 iterations. 

Figure 3 shows the autocorrelation plots along with posterior density estimates derived from the Markov 
chains. Starting from EMCMC1 under the original parametrisation, notice that for A = the chain exhibits strong 
serial dependence even at large lags, particularly for p. This is due to strong a priori dependence between the pa- 
rameter and the number of Poisson points, as shown in (20), which remains significant in the posterior distribution. 
In particular, the posterior correlation between Ki and p was estimated equal to 0.93, thus suggesting that non- 
centring the Poisson process can result in better mixing rates, as Figure 3d confirms. To improve the performance 
of the exact methods, we consider various values for A = {2, 5, 10), thus revealing the path at additionally 2, 5 and 
10 points between consecutive observations respectively. The increase in performance is reflected in the autocor- 
relation function, which now decays to more quickly. As expected, the chains of the AMCMC algorithms mix 
more rapidly than the exact ones due to the less amount of augmentation. The posterior distributions estimated 
from the HFA algorithms provide evidence of bias even for M — 30 (Figures 3g to 3i). 

Table 1 presents posterior summary statistics. Notice that even for M = 30, the AMCMC algorithm fails 
to pass the Kolmogorov-Smirnov tests at a 5% significance level, whereas less amount of imputation (M = 10) 
combined with the integration by parts yields less biased approximations, clearly illustrating the importance of 
eliminating the stochastic integrals as in (1 1). In terms of computational performance, the interweaved algorithm 
with A - 2 outperforms the rest and exhibits adjusted ESS for fi and cr which are respectively 36% and 23% larger 
than that of the sufficiently accurate AMCMC methods. 

The algorithms were also run using proper priors, an exponential distribution for p, a Gaussian for p and an 
inverse Gamma for cr 2 , yielding no significant differences from the results presented above. 

6.2 A double well potential model 

We consider the solution process to 

dV, = -pV,(v?-n)ds + (rdW„ 
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(a) Pearson 



(b) DWELL, no error 



(c) DWELL, with error 
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(d) MVWELL, 1st coordinate 



(e) MVWELL, 2nd coordinate 
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Figure 2: Simulated datasets from (a) the Pearson model, (b) the double well potential model, (c) the double well observed with error, and 
(d),(e) the two-dimensional double well model. 
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Figure 3: The Pearson diffusion model with n = 1000 simulated data points. True values are (p,/i,cr) = (1/2, 1, 1/2). Autocorrelations are 
reported after a burn in of 5000 iterations. Posterior density estimates using EMCMC1 (interweaved) and AMCMC algorithms for (g) p, (h) /i 
and (i) cr. 
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Method 


Par. 


1 

,( 


M 


Mean 


aJJ 


pec 


T7C C 




Time 


Correlation matrix 


EMCMC1 (noncentred) 


p 





2.316 


0.505 


0.048 


6.599 


16.789 




2.544 


1.000 


-0.447 


0.539 




1-1 






0.995 


0.050 


12.432 


31.629 








1.000 


0.008 




<r 






0.499 


0.015 


12.117 


30.829 










1.000 


EMCMC1 (interweaved) 


P 





2.314 


0.506 


0.049 


6.686 


22.170 




3.316 


1.000 


-0.438 


0.540 




f 






0.994 


0.050 


15.035 


49.859 








1.000 


0.021 




cr 






0.499 


0.015 


14.408 


47.779 










1.000 


EMCMC1 (interweaved) 


P 


2 


4.346 


0.504 


0.048 


11.094 


53.153 




4.791 


1.000 


-0.416 


0.537 




/' 






0.995 


0.050 


18.951 


90.792 








1.000 


0.026 




(r 






0.499 


0.015 


17.022 


81.554 










1.000 


AMCMC-30 


p 




30.000 


0.501 


0.048 


4.916 


80.518 


0.005 


16.380 


1.000 


-0.439 


0.519 




/' 






0.993 


0.050 


5.160 


84.516 


0.017 






1.000 


0.021 




cr 






0.495 


0.014 


5.329 


87.293 


< 0.001 








1.000 


AMCMC-10 (int-by-parts) 


P 




10.000 


0.504 


0.049 


12.898 


81.865 


0.286 


6.347 


1.000 


-0.422 


0.537 




/' 






0.995 


0.050 


13.878 


88.087 


0.749 






1.000 


0.020 




<r 






0.499 


0.015 


13.793 


87.545 


0.583 








1.000 



Table 1: The Pearson diffusion model with n = 1000 simulated data points. True values are (p,/J, cr) = (1/2, 1, 1/2). Statistics are reported 
after a burn-in period of 5000 iterations. The M column shows the average number of imputed points between consecutive observations. The 
SD column shows the standard deviation. The ESS column shows the effective sample size per 1000 iterations. The Time column shows the 
time in seconds required for 1000 iterations of each chain. The adjusted effective sample size is shown in column ESS a( //. The KS column 
shows the p-value for the Kolmogorov-Smimov test with null hypothesis that draws from the exact and approximate marginals come from the 
same distribution. 

where p > Q,p > 0, cr > 0. The parameter vectors are identified as f?i = (p,p) T and 62 = cr. The process is known 
as the double well potential process (denoted by DWELL hereafter). We simulated n = 1000 (excluding the initial 
point) equidistant observations with Af; = 1 for parameter setting (p,p,cr) = (0.1,2, 1/2) and Vo ~ N(2, 1/4) 
(Figure 2b). Reduction to a unit volatility process is easily achieved using X s :- V s /cr, which solves the SDE 

dX s = -pX s (cr 2 X 2 -p)ds + dW s . 

Simple calculations reveal that the function /(m; 9) := \\a(u; 9)\\ 2 + A x H(u; 9)12 is given by 

/ (u; 0) = I {po-V - 2ppcr 2 u 4 + (pp 2 - 3cr 2 ) u 2 + p) , 

and that the algorithms are applicable with 1(9) = /(«/; 9), where 

2 _ 2pp + y/p(pp 2 + 9cr 2 ) 
ipcr 1 

Finally, for a given realisation of the layer L the upper bound (13) is easy found by noticing that 

/ (k; 9) < P - (pcr 4 u 6 + pp 2 u 2 + p) =: g(u; 9), 
which is convex and has a minimum at u = 0, implying that 

r(U 9) = [g [x(9 2 ) ~ L5; 9}v g [y(9 2 ) + LS; 9}] - 1(9). (21) 

We assign improper prior densities to the parameters with n(p) oc 1, n(p) oc 1, and n(cr) oc l/cr, and run the chains 
for 10 5 iterations. The performance of the algorithms and posterior density estimates are shown in Figure 4. 
Notice that the performance of EMCMC3 under the original parametrisation is very poor, due to strong posterior 
correlation between Poisson points and parameters; a fact attributed to the sensitivity of r(L; 9) to the parameters 
(see expression (21)). On the other hand, the noncentred algorithm exhibits a much stronger performance with 
low serial correlation for each parameter even after 50 lags . 

Posterior summaries from the output of the chains and computational performance are gathered in Table 2. 
In contrast to the EA1 example presented earlier, the interweaved strategy, after accounting for the additional 
computational cost, does not offer any significant improvement over the noncentred algorithm. Finally, we found 
that AMCMC with M — 40 paired with integration by parts provides a reasonable approximation to the posterior 
marginal densities and exhibits slightly larger adjusted ESS than the noncentred algorithm with A — 2. The results 
were robust in changes in parameter prior distributions. 
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Figure 4: The DWELL diffusion model with n = 1000 simulated data points. True values are (p,//, cr) = (0.1, 2, 1/2). Autocorrelations are 
reported after a burn in of 5000 iterations. Posterior density estimates using EMCMC3 (noncentred) and AMCMC algorithms for (d) p, (e) /i 
and (f) cr. 
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Figure 5: The DWELL diffusion model observed with error with n = 1000 simulated data points. True values are (p,/i,cr, t) = 
(1/2, 1, 1/2, 1/2). The outputs of the MCMC chains are subsampled every 10 iterations. Autocorrelations are reported after a burn in of 
5000 iterations. Posterior density estimates using EMCMC3 (noncentred) and AMCMC algorithms for (c) p, (d) /i, (e) cr, and (f) r. In (c)-(e) 
we superimpose the posterior density obtained when the same process is directly observed. 



6.3 Noisy observations 

We now illustrate the performance of EMCMC by adding to the observations of the previous example a Gaussian 
error with mean and variance t 2 = 1/4 (Figure 2c). We have assigned the same priors for the diffusion pa- 
rameters as before, and an improper prior proportional to 1 /t for r. We employ EMCMC3 under the noncentred 
parametrisation and the HFA scheme with increasing values of M — {5, 10,20,40}. The algorithms are run for 
10 5 iterations and the MCMC outputs are thinned every 10 iterations. Figure 5 presents autocorrelation plots for 
the parameters along with the marginal posterior density estimates. It is interesting to notice that the presence of 
noise in the observations seems to aid the approximation of the AMCMC methods, since the posterior densities 
do not change significantly as M increases, and seem to provide a reasonable approximation to the exact ones. 
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Method 


Par. 


,( 


M 


Mean 






T7C C 




Time 


Correlation matrix 


EMCMC3 (noncentred) 


P 


2 


8.466 


0.089 


0.010 


2.820 


52.718 




18.695 


1.000 


0.464 


0.370 




l-i 






2.026 


0.161 


3.130 


58.521 








1.000 


-0.004 




<r 






0.495 


0.012 


3.510 


65.616 










1.000 


EMCMC3 (interweaved) 


P 


2 


8.313 


0.089 


0.010 


2.769 


54.197 




19.574 


1.000 


0.465 


0.384 




f 






2.027 


0.160 


3.189 


62.422 








1.000 


0.012 




(r 






0.495 


0.012 


3.655 


71.534 










1.000 


AMCMC-40 


p 




40.000 


0.090 


0.011 


3.337 


73.816 


0.059 


22.123 


1.000 


0.472 


0.386 




/' 






2.029 


0.161 


3.631 


80.322 


0.162 






1.000 


0.026 




<r 






0.494 


0.012 


3.594 


79.502 


< 0.001 








1.000 


AMCMC-40 (int-by-parts) 


P 




40.000 


0.089 


0.010 


3.409 


74.472 


0.082 


21.848 


1.000 


0.467 


0.376 




/< 






2.023 


0.160 


3.916 


85.557 


0.704 






1.000 


0.006 




<r 






0.495 


0.012 


3.796 


82.930 


0.293 








1.000 



Table 2: The DWELL diffusion model with n = 1000 simulated data points. True values are (p,/J, cr) = (0.1, 2, 1/2). Statistics are reported 
after a burn-in period of 5000 iterations. Column details as in Table 1. 



6.4 A bivariate double well potential 

We consider a double well potential process in two dimensions (denote by MVWELL hereafter), solution to 

2 

dV s = - yV.^CV^ds + o-dW s , where G(v) = pi 

and pi,p2,/J.i,/j.2,cr > 0. The parameter vectors are identified as 6\ = {p\,p\,p2,H2) T and 62 = cr. The invariant 
density of the process is proportional to exp {-G(v)} and has two modes, at ( -\f/J\/n2, -\Jia) and (— y[p\lp2, - ■\fp r i)- 
This model belongs in the EA3 class and reduction to a unit volatility process is achieved as X s := V s /cr. The 
lower bound 1(9) and Poisson intensity r(L; 6) are given in the Appendix. 

We simulate n = 1000 equidistant observations with Af, = 1, Vo — (0,0) r and parameters (p\,p.\,p2,^i,cr) 
( 1 2, 2, 1/2, 1, 1/2). The simulated dataset is shown in Figures 2d and 2e. We assign improper priors to the 
parameters with n(pi) oc \,n(pi) oc 1,jt(ct) oc l/cr, i = 1,2. All MCMC chains were run for 10 5 iterations 
and the performance of the algorithms is shown in Figure 6. For A — 5, the EMCMC3 algorithm under the 
original parametrisation performs poorly, whereas the noncentred exhibits a much more rapid mixing. On the 
other hand, the interweaved strategy with A = 2 shows a comparable performance to that of the noncentred. 
Posterior summary statistics from the algorithms are shown in Table 3. From the approximate methods, we found 
that AMCMC-60 paired with integration by parts was the most efficient algorithm which provided a sufficiently 
accurate approximation to the posterior marginal distributions. 

Finally, as we pointed out in Section 2.2.2, the EA3 is inflicted by an additional computational cost due to the 
rejection sampler for the layered Brownian bridge, which is clearly reflected in the CPU time of the interweaved 
algorithm. In particular, a computational profiling of the algorithm showed that approximately 91% of the total 
time was used by EA3, out of which 85% was due to the simulation of layered Brownian bridges. This suggests 
that an alternative more efficient design of the layered Brownian bridge simulation would boost substantially the 
performance of EMCMC3. 

7 Discussion 

We have developed exact data augmentation methods for discretely directly and indirectly observed diffusions. 
We have established the precise connection between this paradigm and the best existing alternative method when 
the variance-stabilising transformation can be performed, the HFA. The empirical comparison of the two methods 
showed that in univariate processes EMCMC can perform at least as well as a sufficiently accurate AMCMC, even 
when ignoring the additional computational cost needed by the latter to determine a good value of M through 
experimentation. For the considered bivariate example, EMCMC is outperformed by AMCMC since the cost 
of the former is dominated by the simulation of the layered Brownian bridges, and thus could be improved by 
considering alternative designs for this simulation. 

We have also pointed out an intriguing connection between exact and approximate methods: the degree of 
freedom rendered by the Poisson sampling rate. On going work involves the rigorous proof of the effect of the 
auxiliary Poisson sampling. The auxiliary sampling can be seen as a variance reduction scheme. In general, 



(v f21 ) 2 -W +p 2 (v f2 »-/, 2 v f11 ) 2 



(22) 



18 





Figure 6: The MVWELL diffusion model with n = 1000 simulated data points. True values are (pi,^i,P2,M2,o~) = (1/2,2, 1/2, 1, 1/2). 
Autocorrelations are reported after a burn in of 5000 iterations. Posterior density estimates using EMCMC3 (interweaved) and AMCMC 
algorithms for (e) pi, (f) fi\, (g) p2, (h) /i2 and (i) <x. 
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1.000 


0.492 


-0.591 


0.016 




P2 
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0.079 


0.558 


33.947 










1.000 


-0.814 


-0.032 




f*2 






1.065 


0.090 


0.584 


35.517 












1.000 


0.019 




tr 






0.503 


0.009 


0.579 


35.224 














1.000 


EMCMC3 (interweaved) 


P\ 
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7.233 


0.467 


0.032 


0.674 


31.721 




47.057 


1.000 


0.192 


0.018 


-0.019 


-0.062 
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1.980 


0.073 


0.751 


35.345 








1.000 


0.482 


-0.582 


0.020 
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0.492 


0.078 


0.872 


41.032 










1.000 


-0.816 


-0.018 




f"2 






1.068 


0.089 


0.936 


44.029 












1.000 


-0.001 




<r 






0.503 


0.009 


0.843 


39.673 














1.000 


AMCMC-60 (int-by-parts) 
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0.467 


0.032 


1.447 


44.949 


0.100 


31.065 


1.000 


0.184 


0.022 


-0.008 


-0.053 
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1.979 


0.072 


1.518 


47.154 


0.831 






1.000 


0.490 


-0.594 


-0.008 




P2 






0.494 


0.079 


1.594 


49.515 


0.361 








1.000 


-0.819 


-0.031 




/<: 






1.066 


0.090 


1.734 


53.862 


0.268 










1.000 


0.027 




tr 






0.503 


0.009 


1.621 


50.371 


0.932 












1.000 



Table 3: The MVWELL diffusion model with n = 1000 simulated data points. True values are (p\,n\,pi,Hi,cr) = (1/2,2, 1/2, 1, 1/2). 
Statistics are reported after a burn-in period of 5000 iterations. Column details as in Table 1. 



there is a large scope for investigating other such schemes both for EDA and HFA. In this article we have already 
demonstrated the effect of performing integration by parts where possible to the efficiency of the algorithms. 

The extension of these methods outside the class of processes prescribed by the current version of EA3 is 
definitely an exciting direction. Another direction of interest for future research is to explore further the connectio n 
between unbiased estimation of transition density and MCMC. There is a large and growing literature which 
develops MCMC algorithms for models with intractable likelihoods using unbiased estimators thereof; see for 
example Andrieu and Roberts (2009); Andrieu et al. (2010). The class of diffusions where such estimators can be 
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obtained is much larger than that simulated by EA3, see for example Section 4.6 of Papaspiliopoulos (201 1). 

There exists available software for implementing all the methods in this paper, which is available on request 
by the authors. 
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8 Appendix 



Proof of Lemma 2 

Proof. Expression (15) is derived by writing the density of the accepted random variables (L, X, <t>) with respect 
to the law of the proposed 



1 

n(L,X,®\v,w,ff) = - — j|: 

a(x, y, t, 0) l_l 



1 



r(L;ff) 



i-tl\ x{ e l) + tly {ei) -e 



and by invoking a change of measure from the law of a Poisson process of intensity r(L; 0) to the law of one 
of unit intensity, thus ensuring a parameter-independent dominating measure. Finally, integrating out the marks 
T = {uj, 1 < j < k}, we obtain expression (15). □ 

Proof of Theorem 1 

Proof. The factorisation of the density in the three terms is elementary. For Vo = v, V, — w and x = 7/(v; 2 ), 
y = r](w; 6 2 ), taking expectations on both sides of (5) with respect to W? we derive the fundamental identity 

p,(x, y; 0) =N t (y- x) exp \H(y; 0) - H(x; 0) - l(0)t} a(x, y, t, 0), 

which combined with (4) gives a(x,y, t, 0) as a function of p,(v, w; 0). Combining this with Lemma 2 yields the 
expression. 

It remains to show that (2) can be obtained by integrating out the auxiliary variables and conditioning on the 
data. However, this is trivial since by taking expectations with respect to the dominating measure 
we obtain the marginal 7i(0) YY1-1 PAr, > V tj ; 0) from which (2) follows as a conditional. □ 

Proof of Theorem 2 

Proof. For notational simplicity, we define K(Y, 0) to be the following deterministic function of observations Y 
and 0: 

n 

exp{H{x„(02); 0) - H{x o (0 2 )- 0} - l(0)(tn - to)} Y] D(V t - 2 )N& ti {xi(0 2 ) - x^(0 2 )} . 



The joint posterior density of and imputed data {£,,-, X,-, 4*/, 1 < i < «} is given (up to a constant) in expression 
(16). Integrating out the Poisson processes is easily done by first integrating out the coordinates and then the 
number of Poisson points. Therefore, integrating out *P; = {if/ij, 1 < j < k,}, 1 < i < n, we obtain 



K(0)K(Y,0)exp 



11 \ " ( 1 r A,i 

- g [r{u 0)-i] Atj Y\ j J o [KU 0)-4> fas + 0}] ds 



Integrating out the Poisson points yields the posterior density of and {L,-, X,-, 1 < i < n] with respect to Leb'' ®" =1 



n{0)K{Y, 0) exp 



n \ n { \ \ r A ' f 

g [r{U; 0) - l] Atj exp — [r(Lr, 0) - <P {X, , + fi i>s (0); o}] ds ■ 



SJ. *^ 



n(0)K(Y0) exp< 



Given the construction of M (A,|) , by integrating out the layers we obtain the joint density of 9 and {X,, 1 < i < n] 
with respect to Leb p ®" =1 W (A '" ' 0) which coincides with the posterior density (11) targeted by PA. □ 
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Proof of Theorem 3 

Proof. For a pair of observations (V fw , V ti ), the joint density of the accepted elements of EA3 (L,-, Xj, '{',•) condi- 
tionally on Vjj_, , Vf,., is given by 

n7 = i (l - I [ly < r(U 6)\ <p (% + //,;, <M): ML ; ; 0)) 
a(x,-i(0 2 ),*;(02),Af,-,0) ' 

with respect to the product measure M (A,,) x L (A,,) , where L w is the measure of a unit rate Poisson process on 
[0, t] x (0, oo). Using the conditional independence of the latent data given Y and 8, 

n 

n M aLt,Xi,% 1 <*'<«} I Y,6) = Y\* x (Li,Xu% I V,,^Vt„0), 
and the proof follows along the same lines as Theorem 1 . □ 
Functions related to the MVWELL model 

Applying the Lamperti transformation to (22), we obtain the drift of the transformed process as 

a(x;0) = V x H(x;0), where H(x;0) = --G{ax). 

Using second partial derivative tests it is straightforward to verify that the function f{u;0) := ||q:(m;#)|| 2 + 
A x H(u; ff)/2 has a global minimum 

= _ 

54^(1+^2) 

where 

Pl := 2fi 2 o- 2 y/2Ti [9 +fij (9 + 2 Pl ^)} 3/2 , pi ■- fa? (54p ljUl (l + - ttftffi + 27p 2 (l + ^) 2 } ■ 
The function / (w; 0) has no local maxima, implying that for a given realisation of the layer L the Poisson rate is 

r(L; 0) = [vt 1 /(« i ; S)]-K0), 

where 

Ul = (x^(0 2 ) - L^6,x^(0 2 ) - L®sf , u 2 = (x {1 \0 2 ) - L^6,yM(0 2 ) + L^sf , 
u 3 = (f } (0 2 ) + L W SJ 2 \0 2 ) - l {2] sf , M4 = (y (1| (0 2 ) + L m 6,y {2 \0 2 ) + L l2] df . 
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